Skip to content

Conversation

rshkv
Copy link
Contributor

@rshkv rshkv commented Nov 26, 2021

What changes were proposed in this pull request?

This raises Spark's minimum supported Pandas version to 1.0.0. If the installed version is below, Spark fails with: "Pandas >= 1.0.0 must be installed; however, your version was ..."

Why are the changes needed?

Some of the Pandas-on-Spark tests do not pass with Pandas < 1.0, see SPARK-37465.

Does this PR introduce any user-facing change?

Yes, users with installed Pandas versions below 1.0 will see failures. Also Pandas 1.0 introduces breaks (listed here) which should not affect Spark's interaction with Pandas, but it might introduce breaks to user environments that get their Pandas version transitively through PySpark.

How was this patch tested?

Existing tests. The Pandas version used in Github actions is 1.3.3. I'll verify tests also pass with 1.0.0.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rshkv rshkv changed the title [SPARK-37465][Python] Raise minimum supported Pandas version to 1.0.0 [SPARK-37465][Python][WIP] Raise minimum supported Pandas version to 1.0.0 Nov 26, 2021
@sarutak
Copy link
Member

sarutak commented Nov 26, 2021

It's already work in progress in #34717 isn't it?

@Yikun
Copy link
Member

Yikun commented Nov 27, 2021

Yep, as @sarutak metioned, @rshkv would you mind help to review #34717?

@rshkv
Copy link
Contributor Author

rshkv commented Nov 27, 2021

Ah didn't see, thanks guys. Will take a look @Yikun.

@rshkv rshkv closed this Nov 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants